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ABSTRACT 

Benford's law (to base B) for an infinite sequence {x^ : k > 1} of positive quantities x\~ is the 
assertion that {log B Xk : k > 1} is uniformly distributed (mod 1). The 3a; + 1 function T(n) 
is given by T(n) = W J" if n is odd, and T(n) = \ if n is even. This paper studies the initial 
iterates x^ = T^\xq) for 1 < k < N of the 3x + 1 function, where N is fixed. It shows that for 
most initial values xq, such sequences approximately satisfy Benford's law, in the sense that 
the discrepancy of the finite sequence {log B x^ '■ 1 < k < N} is small. 

Mathematics Subject Classification (2000) Primary: 11B83, Secondary: 11J71, 37A45, 60G10 



1. Introduction 

The 3x + 1 problem concerns the behavior under iteration of the map T : 7L — > Z given 
by T(n) = ^ or T(n) = according as n is even or odd. That is, T(2m) = m and 

T{2m + 1) = 3m + 2. The notorious 3x + 1 Conjecture asserts that when started from any 
positive integer n, some iterate T^\n) = 1; it remains unsolved. Surveys of work on this 
problem appear in Lagarias |14j and Wirsching 26 . 

It is well known that the initial iterates of this map exhibit a "random" character. This 
holds in the sense that the initial iterates of a randomly selected integer appear to be even or 
odd with equal probability. Such a result can be rigorously justified if one takes the interval 
1 < n < X = 2 k and considers only the first k = log 2 X iterations (see [141 Theorem A] ) . This 
leads to the rapid decay of most trajectories of the iteration under T, at an exponential rate, 

with an expected decrease by a multiplicative factor ~ 0.86602 at each step. These facts 
support the conjecture that all orbits of the 3x + 1 iteration enter a bounded set, and hence fall 
into a finite number of periodic orbits. Heuristic stochastic models (Lagarias and Weiss |15j . 
Borovkov and Pfeifer predict that for an integer of size about X the "random" character 
above persists for about the first a log A iterates, with a = (ilog|) 1 w 6.95212; the model 
predicts most integers of size near X will arrive at the periodic orbit {1,2} near this number 
of iterations. The stochasic model in JH] also predicts that for large n the number of steps 
to enter a periodic orbit should never exceed 42 log n. Experimentally, Roosendaal [23] has 



found a number n of size 7.2 • 10 21 which requires about 36.7 log n iterations before entering 
the periodic orbit {1,2}. 

The present paper concerns the base B expansion of the initial sequence of the first N 
iterates of the 3x + 1 map on a random starting value n, drawn from 1 < n < X where 
X > 2 N . This is in the region of the dynamics where most trajectories are decreasing at an 
exponential rate, before they enter a periodic orbit. It shows that, in a certain sense, the 
leading digits of the base B expansion of most such sequences approximately satisfy a strong 
form of Benford's law. Here Benford's law concerns the distribution of the initial digits in 
the base B expansion of an infinite sequence X = {x\, X2, £3, •••} of positive real numbers. 
The original version of Benford in 1938 concerned the first few leading digits in the decimal 
expansion of real numbers in tables; the distribution had already been formulated by Newcomb 
|2U| in 1881. An infinite sequence X is said to satisfy the strong Benford's law (to base B) 
if for each fixed k > 1, the first k digits in the B-axy expansion of {x%, x%,Xs, ...} approach 
limiting probabilities given by the "B-ary Benford distribution" , which we specify below. This 
is known to be equivalent to the condition that the associated infinite sequence yi := log B Xi 
is uniformly distributed modulo one (Diaconis [HI Theorem 1]). In what follows we adopt this 
criterion as our definition of Benford's law. 

This paper is motivated by work of Kontorovich and Miller , who showed that certain 
statistics drawn from 3x + 1 iterates approximately obey Benford's law. They treated a version 
of the 3x + 1 iteration in which the initial starting point wq is an odd integer, and they studied 
the subset of the successive odd integers {wi, W2, • ••} appearing in the 3x + 1 iteration of wo. 
Here wi = T^ ki \wo) where k = ki is the i-th value where T^ k \wo) is odd. They showed that 
for a suitable natural initial distribution on the odd integers drawn from 1 < wo < X, and for 
a suitable number k of iterates (growing slowly with X), as X — > 00 the distribution of the 
-B-ary digits of the ratios w^/wo approached the B-aicy Benford distribution, provided that B 
was not a power of 2. More precisely, they obtained the Benford distribution in a double limit, 
in which X — > 00 with k held fixed, and after this taking k —* 00. They also gave results of 
numerical simulations indicating that the distribution of the odd 3x + l iterates {w%, u>2, Wk} 
starting from an odd wq themselves should approximately satisfy Benford's law, for all integer 
bases B not a power of 2. In the case where B is a power of 2, they showed that a double 
limiting distribution exists, but is not the B-ary Benford distribution. 

The main result of this paper, Theorem 12.11 in §2, establishes in a quantitative form the 
assertion that most initial sequences of the first N iterates of the 3x + 1 function approxi- 
mately satisfy the strong Benford law. It applies to a finite sequences of initial 3x + 1 iterates 
{xi, X2-, ...,xn}, and obtains an upper bound on the discrepancy D({y%, y2, Un}) of the se- 
quence of numbers yj = log B Xj for most such sequences. The discrepancy is a well-known 
statistic which is a measure of distance to the uniform distribution. It is defined in §2, and 
relevant properties of discrepancy are treated in §3. We obtain an explicit upper bound on the 
number of "exceptional" sequences for which the discrepancy is large. We treat 3x + 1 iterates 
including both even and odd iterates, and our main result implies convergence to a generalized 
Benford's law for all bases B > 2, including B being a power of 2. The anomalous behavior 
of powers of 2 in the results of Kontorovich and Miller is associated to their restriction to 
the subset of iterates that are odd integers. 

The basic approach is as follows. We use the fact that the initial iterates of a large randomly 
chosen integer n are well approximated by a stochastic process that takes T(n) = ^ or T{n) = 
4p with equal probability. Taking logarithms to the base B, we are reduced to studying the 
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stochastic process which sets either 

Vn+l = Vn+0l, 

or 

Vn+l = Vn + #2 

with equal probability, where 

3 1 
#1 = log B - and 9 2 = \og B -. 

In §4 we consider this process in its own right, for arbitrary (9%, 62)- We first show that the 
realizations 

lo = {y n : n = 1,2,3, ...} 

of such a stochastic process for general (#i,#2) are uniformly distributed modulo one with 
probability one, if and only if at least one of 9\ or 9 2 is irrational. The main result of §4 shows 
that if the numbers 9\ and #2 are not simultaneously well approximable by rational numbers, 
as specified by a two-dimensional "Diophantine property", then for any fixed N most initial 
segments of length N are close to the uniform distribution, quantitatively given by an upper 
bound on their discrepancy. 

In §5 we apply the results of §4 to the 3x + 1 iteration. We show using a result of Rhin [22] 
that 9\ = \ogg I and #2 = logs \ have suitable two-dimensional Diophantine properties for 
the results in §4 to apply. Then we establish that the 3x + 1 iterates are sufficiently close to 
realizations of the stochastic process to obtain upper bounds on the discrepancy of sequences 
for most initial inputs, provided we average over 1 < n < X, and for N iterates we require 
X > 2 . Putting all these results together yields the main result, Theorem 12.11 

The main result is established here for the 3X + 1 function, but the methods used apply 
equally well to number-theoretic maps of a similar nature, such as the Qx + 1 function, for 
odd Q, with Tq{n) = ^ or Tq(ti) = Sztfcl according as n is even or odd. Results analogous 
to Theorem 12.11 should hold for the distribution of the first N iterates of such functions. For 
Q > 5 it is expected that most initial values of the Qx + 1 iteration never enter a periodic 
orbit, but diverge to +00. It seems possible that the infinite sequence {x n : n > 0} of a 
divergent orbit might actually satisfy a strong Benford's law. However at present there seems 
no approach to address this question; even the existence of a divergent orbit for the Qx + 1 
function, for any Q > 5, remains an open problem. 

There has been other work showing that the iterates of certain dynamical systems satisfy 
Benford's law, see Berger, Bunimovich and Hill [H] and Berger For various properties 
of Benford's law, see Hill ^Oj- Finally we observe that the approach of Kontorovich 
and Miller to Benford's law for 3x + 1 iterates introduced several ideas to this problem, 
including approximation to a stochastic process (not the one studied here) , as well as a relation 
to Diophantine properties of certain constants. Their approach starts from a structure formula 
for odd iterates of the 3x + 1 function given by Sinai [2^] and extended in Kontorovich and 
Sinai ^2] to a wider class of maps. Their main result ( Jill Theorem 5.3]) for the 3x + l function 
establishes the uniform distribution in a double limit of yi := log^(^-) for any real base B 
such that \og B 2 satisfies a one-dimensional Diophantine property, as defined in §4 below. 

Notation. We let \x\ denote the largest integer that does not exceed x, and we let {{x}} := 
x — \x\ denote the fractional part of x, with < {{x}} < 1. Finally ||x|| = min ng ^ |n — x\ 
denotes the distance of x from its nearest integer. 
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2. Main Result 



Benford's law concerns the distribution of the initial digits in the base B expansion of an infinite 
sequence X = {x±, X2, X3, ...} of positive real numbers. An infinite sequence is said to satisfy 
the strong Benford's law (to base B) if the associated infinite sequence y = {yi, 2/2, 2/3, •••} given 
by the base B logarithms j/j := log B Xj is uniformly distributed modulo one. Suppose that the 
numbers x n have B-oiy expansion 

/ 00 \ 

(n) ; 



! n = B M «\Y,€ ) B- h 



\k=0 / 

(n) (n) 

with 1 < d ' < B — 1 and < d\ ' < B — 1 for k > 1. Benford's law is the statement that 



Prob 



4 n) = d 



Iog B (d+ 1) - log B d 



for 1 < d < B — 1, in which the "probability" is interpreted as a limiting frequency in the first 
N values of x n as N — > 00. More generally the strong Benford probability of observing a given 
block of .ftT digits [eZodi--dft-_i], with do 7^ 0, is given by 



Prob 

where 



4 n) d< n) • • • 4^ := d d! • • • d K ^ = log B (r + B~ K+l ) - \og B r, 



K-l 

r = djB~ j . (2.1) 
3=0 

The departure from uniform distribution modulo one of a finite set y can be measured 
using the discrepancy. 

Definition 2.1. The discrepancy D(y) of a finite set y = {yi,y2, ■■■,Un} of real numbers is 
defined as follows. For < a < (3 < 1 set 

Z(y- a, f3) := : a < {{y,}} < /?}. (2.2) 

m which {{y}} = y — [yj * s the fractional part of y, and then let 

D(y-a,(3) := Z(y-a,P)-((3-a). (2.3) 
The (normalized) discrepancy D(y) is then 

D(y):= sup \D(y;a,P)\. (2.4) 

0<a</3<l 

It is also given by 

D(y)= sup £>(}>; 0, a)- inf L>(3^;0,a). (2.5) 

0<a<l 0<a<l 

One has < -D(3^) < 1; smaller values of D(y) correspond to more uniformly spaced sets 
y modulo one. No finite distribution can be perfectly uniform, so there is a nonzero lower 
bound on the discrepancy of all sequences of length N. This minimal value of the discrepancy 
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is attained by equally spaced elements Vi = jr for < i < N — 1, with D(y) = This notion 
of discrepancy is translation-invariant; that is, for any real yo, one has 

D(y + y ) = D(y). (2.6) 

Some authors treat instead a (normalized) non-translation invariant discrepancy 

D*(y) := sup Z(y;0,a)-a 

0<a<l 

This is related to D(y) by the inequlities D*(y) < D(y) < 2D*{y). 

Our definition of discrepancy follows Kuipers and Niederreiter and Drmota and Tichy 
[Zj ) . A few authors (Montgomery ^H] ) study an unnormalized discrepancy that does not divide 
by N; this version of the discrepancy takes values between and N. 

The main result of this paper is an upper bound on discrepancy of the base B logarithms 
of most initial 3x + 1 sequences. 

Theorem 2.1. Let B > 2 be a fixed integer base. For each N > 1 and each X > 2 , most 
initial seeds xq in 1 < xo < X have first N initial 3x + 1 iterates {xk ■ 1 < k < N} that satisfy 
the discrepancy bound 

D({log B x k :l<k< N}) < 2iV"M. (2.7) 

The set £(X,B) of exceptional initial seeds xq in 1 < xq < X that do not satisfy this bound 
has cardinality 

\S(X,B)\<c(B)N~^X, (2.8) 
where c(B) is a positive constant depending only on B. 

This result implies approximation to base B Benford's law, as follows. Let X = {x\, ...,xn} 
be a set of positive real numbers, and set yi = \og B Xi and y = {yi, yjv}- Let 1 < r < B be 
a B-aiy rational as in (|2.1[) with 1 < r < B. Requiring that the first K digits of x n match the 
digits of r is clearly equivalent to having {{y n }} he in the interval [log^ r, \og B {r + B~ K+1 )). 
Fron the definition of discrepancy, we have that 



^# {1 < i < N : log B r < {{log B x { }} < log B (r + B~ K+1 )} - log B (- B " 



is bounded above by D({y%, y2, Vn}), independent of K. Theorem 12.11 upper bounds this 
discrepancy for the intial iterates of most 3x + 1 sequences. 

3. Discrepancy and Exponential Sums 

We will use standard criteria for uniform distribution of an infinite sequence y = {yi,y2, •••} 
in terms of exponential sums and of the discrepancy of its initial segments ( [191 Chap. 1]). 

For an infinite sequence y = {yi,y2, •••} we let 3^7V denote the first N elements of y. For 
integers k, we associate to y^ the 'Fourier coefficients' 

N 

u N (k,y) = u(k,y N ) :=J2e 2 * ikVj - (3.1) 

3=1 
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Proposition 3.1. For an infinite sequence y = {yi,y 2 , •••} of real numbers, the following 
conditions on y are equivalent. 

(1) The sequence y is uniformly distributed modulo one. 

(2) (Weyl's criterion) For each nonzero integer k we have 

lim hu N (k,y)\=Q. (3.2) 

N— >oo iv 

(3) For any properly Riemann integrable function F on [0, 1], 

N 

1 

im — 

N- 



3=1 

(4) The discrepancy D(3^y) satisfies 



1 N r 1 
) im ^E F ^)= / F ® dt 

^ Jo 



(3.3) 



lim D({ yi ,y 2 ,...,y N })=0. (3.4) 

Proof. Here (l)-(3) are Weyl's criterion in ^1 page 1], and the equivalence of (1) and (4) 
appears in page 2]. ■ 

We will need a quantitative relation between exponential sums U;\i(k,y) and discrepancy, 
given by the Erdos-Turan inequality. 



Proposition 3.2. (Erdos-Turan Inequality) For any positive integer K > 1, 

K „ „ N 

- + 3 

+ 



1 K 1 11 N 

D ({ yi ,y 2 , ...,y N }) < + 3 £ r £ e^ \ (3.5) 

~ k=l n=l 

Proof. This is a weak form of the Erdos-Turan inequality. A short proof of it is given in 
Montgomery |19t page 8] (after normalizing the discrepancy). For a stronger form see Kuipers 
and Neiderreiter [131 Ch. 2, Theorem 2.5]. ■ 

We will also need the following simple bound on the change in discrepancy under pertur- 
bation. 

Proposition 3.3. // |yj — < e for 1 < % < N then 

P({yi,2/2,-,yjv}) -D({ yi ,y 2 , -,yjv}) I < 2e. (3.6) 

Proof. Let y and y' denote the sets in the Proposition. Suppose first the discrepancy D (y) 
is attained on an interval J = [a, (3] with Z(y; J) — \J\ > 0. If a > e and (3 < 1 — e, then with 
J' = [ a - e, /3 + e] we see that Z(y'; J') > Z(y; J) and it follows that 

D(y') > Z(y'; J') - | J'| > Z(y- J)-\J\-2e = D{y) - 2e. 

If a < e or /3 > 1 — e we would still like to consider J' C [0, 1] which is the image (mod 1) 
of the interval [a — e, /3 + e]. The only issue is that J' now consists of two intervals, one 
near and the other near 1. However, the complement J' c is a genuine interval and we have 
\J' C \ - Z(y'; J' c ) = Z(y'; J') - | J'\ > D{y) - 2e. Thus we have again that D(y') > D(y) - 2e. 
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In the remaining case that the discrepancy D(y) is attained on an interval J = [a, 0\ with 
| J\ — Z(y; J) > 0, we consider J' = [a + e, f3 — e] if f3 — a > 2e, and J' to be the empty interval 
otherwise. We deduce in this case also that D(y') > D(y) — 2e. 

Since y and y' are interchangeable in the argument, we obtain D (y) > D(y') — 2e, com- 
pleting the proof. ■ 

In the sequel we will obtain bounds on exponential sums and from this derive bounds on 
the discrepancy using the Erdos-Turan inequality. We will approximate the values yi = log B xi 
of the 3x + 1 iterates of a randomly drawn initial value x$ by the values of a stochastic process, 
of a type which we analyze in the next section. 



4. Stochastic Process 



We study the following family of stochastic processes. We suppose that we are given two 
real numbers (81,62), and an initial value yo. The discrete stochastic process V(9i,02,yo) has 
realizations of the form 

w = (2/1,2/2,2/3, •••) (4.1) 
in which the yi are generated from the initial value 2/0 by choosing 

y n+ i =y n + 0i with probability -, and y n+ \ = y n + 9 2 with probability -, (4.2) 

where each step is an independent Bernoulli trial. We think of the yi as given modulo one, in 
which case this process is a Bernoulli mixture of two rotations of the circle. 

Theorem 4.1. If at least one of 9\ or 62 is irrational, then for any fixed initial value yo 
the process V(0i,O 2 ,yo) has a probability one subset of realizations u = (2/1, 2/2, •••) that are 
uniformly distributed modulo one. Equivalently, with probability one, 

lim D({ yi ,...,y N })=0. (4.3) 

Note that if 9\ and 62 are both rational numbers, then the values y^ can only take a finite 
number of distinct values modulo one and no realization uj is uniformly distributed modulo 
one. We also remark that Theorem 4.1 may be easily generalized to cover Bernoulli mixtures 
of K rotations of the circle. 

Theorem 14. 1 1 will be derived using exponential sums. We first study finite initial segments 
of length N of such a stochastic process V(9i,92,yo)- We let 

■= (2/1,2/2, -,2/7v)- 

denote such an initial segment, and write E Wjv [/(u;7v)] for the expected value of a random 
variable over the process restricted to these initial segments. We begin by calculating the 
second moment of the individual Fourier coefficients Ujs[(k,uj) of ojn- 

Lemma 4.1. For each N > 1 and each k G Z 



IMA;,,,)! 2 ] =iV + 2Re (J>T-r)( \ ) ). (4.4) 

r=l 

If at least one of Q\ or 9 2 is irrational, then for each non-zero integer k and each N > 1 

E - N^)l 2 ] < (1 + | 2 _ e ^_^| h * i 1 + WTw)^' (45) 

where ||£|| = min ng 2 |£ — n\ denotes the distance between £ and its nearest integer. 
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Proof. Observe that 



2V 



\U N {k,u;)\ r ' 



3=1 



N + 2 Re ( ^ e 2*iHvt-Vj) 
l<j<l<N 



If we write r = £ — j then — j/j is a sum of r random variables each taking the values 9\ or 
#2 with equal probability. Thus 



E, 



D 2Trik(y e -yj) 



^2-K%kd\ _|_ g27rifc02 \ ^— j 



Since for 1 < r < N there are N — r pairs 1 < _/ < £ < N with £ — j = r, we conclude that 



N 



E. 



\U N (k,u;)\ 2 = N + 2 Re (^(iV-r) 



2nik9i _|_ ^2irik82 ^ r 



r=l 



This proves (jOjl . 

For any z/lwe note that 



r=l 

and so, if |z| < 1 and z^lwe get that 

N 



(N - l)z - Nz 2 + z N+1 



r=l 



N\z-z 2 \ + \z-z N+1 \ 2N 
< — . .„ < 



1 - z 



1 - ^ 



(4.6) 



If at least one of 6\ or 62 is irrational, then for non-zero k we have that e 2 1 + e 2mke2 7^ 2, 
and of course \e 2nidl + e 2 ™ e2 | < 2. Combining (JOJ), and gSJ) with z = ( e 2nikei + e 2lTike2 )/2, 
we obtain that 



E, 



\U N (k,u)f 



< 1 + 



2 g2nik9i ^2irik02 I 



N. 



For |£| < 1/2 note that sin 2 «) > 4£ 2 and so 
12 - e 



3 27rifc6»i _ e 2mke 2 I 



> 2 - cos (2vrA;(9i ) - cos (2vr£;6>2 ) 
= 2(sinV/c#i) +sinV/c# 2 )) > 8 (||A:0i|| 2 + H^f) 



which completes the proof of Q4.5JI . ■ 

Proof of Theorem 14. II We suppose that at least one of 9\ or 62 is irrational. We claim that 
for each nonzero k there holds 



Prob,, 



lim hu N (k,u)\=0 

N^oo N 



1. 



(4.7) 



Thus, for each fixed non-zero integer k, there is a probability one set of u such that 
limTv^oo j?\UN(k,u>)\ = 0. Since the set of non-zero integers k is countable, it follows that the 
set of all uj for which lirriTv-^oo jj\UN(k,w)\ = holds simultaneously for all non-zero integers 
k still has probability one. (Its complement is a countable union of sets of measure zero.) 
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Now by Weyl's criterion (Proposition I3.1f 2)) all such uj are uniformly distributed modulo one. 
Proposition 13,1( 4) then yields 1)4. 3|) with probability one. 
To prove 1)4.71) it suffices to show that for each 1 > 5 > 



Ps ■= Prob, 



limsup — \U]y(k, u>)\ > 5 

N^oo N 



(4.8) 



For j > 1 set Nj := 
that 



. If Nj < N < Nj+i is such that \Ujf(k,u)\ > 5N, then we see 



N 



\U Nj (k,u)\ > \U N (k,u)\ 

Therefore, for any B > 1, 
P s < Prob 1 



>SN-(N- Nj) > Nj 



5 )n> 6 -n. 



limsup— | (A;, w) | > -] < ^Prob^C^-CM)! > — ^ 



(4.9) 



Now 



Prob, 

and by Lemma 14. II this is 



\U N .(k,u;)\>^- 



< 



SN 



\U N Ak,u)\' 



4 , 



1 



\k9xW 2 + ||jfc0 2 | 



A'; 



We use this in (|4.9|) . and obtain that for any 2? > 1, 



4 / 1 



00 -. 
^ AT, 



Since the iVj grow exponentially, letting 1? — > 00 we may conclude that = 0. This estab- 
lishes 1)4.8)1 . and 1)4 .7|) and the Theorem follows. ■ 

For general non-rational pairs {61,62) the convergence rate to zero in ()4.3j) . or equivalently 
(|4.7|) . can be arbitrarily slow. To obtain explicit bounds on the convergence rate in 1)4. 3|) one 
must impose restrictions on the Diophantine approximation properties of the numbers 6\ and 
&2- The following definition has been much used in connection with "small divisors" problems 
in dynamical systems, cf. Herman |Hj, Yoccoz |27j . |28j . and in number theoretical dynamics 
cf. Marklof Q2]. 

Definition 4.1. A real number 6 is said to be Diophantine with exponent a if there is a 
positive constant C(6) such that for all integers k > 1 



\k6\\ > C(6)\k\~ a . 



(4.10) 



Any real number that is Diophantine with some positive exponent a is irrational; necessarily 
a > 1. For any a > 1, the set of real numbers that are Diophantine with exponent a has full 
Lebesgue measure. In fact the exceptional set of real numbers that are not Diophantine with a 
given exponent a > 1 has Hausdorff dimension /(a) with /(a) < 1. Liouville numbers are those 
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real numbers that are not Diophantine for any finite exponent, and they form an uncountable 
set of Hausdorff dimension zero. The set of real numbers that are Diophantine with exponent 
a = 1 comprise the badly approximable numbers, and these form a set of Hausdorff dimension 
one but Lebesgue measure zero. 

In this paper we use the following generalization of this notion to simultaneous approxima- 
tion, which is the complement of the notion of ci-dimensional very well approximable vectors 
appearing in Schmidt 



Definition 4.2. The vector (9±, 02, .., 8^) of real numbers is said to be ci-dimensional Diophan- 
tine with exponent a if there is a positive constant C(8\,82, ■■■,0d) such that for all integers 
k > 1 

max(||fc0i||, \\k9 2 \\, .., \\kO d \\) > C(9 1 ,9 2 , ...,8 d )k- a . (4.11) 

This notion has been used in the dynamical system context by Marklof JJj, JH]- Here 
we use the case d = 2. The multidimensional notion is less restrictive than the case d = 1 
in the sense that if any 9i is one-dimensional Diophantine with exponent a, then the vector 
(8i,..., 9d) will be ci-dimensional Diophantine with the same or smaller exponent. 

The next result gives bounds on the expected size of the discrepancy of a finite initial 
segment of this stochastic process, under suitable Diophantine conditions on (8i,8 2 )- 

Theorem 4.2. Suppose that the pair {61,82) is two-dimensional Diophantine with exponent 
a. Then there is a constant 02(81,82) such that for all N > 1, 

JW£>({i/i,ifc,...,Mv})] <c 2 (e u e 2 )N-^. (4.12) 

Proof. The Erdos-Turan inequality (Proposition 13. 2|) gives that for any K, 

1 - 1 

E^ N [D({ yi ,...,y N })} <^^ + 3 ^— M UN [\U N (k,u;)\}. (4.13) 

fc=l 

By the Cauchy-Schwarz inequality, (|4.5jl . and the definition of the two-dimensional Diophantine 
property we have that 

E UN [\U N (k,u;)\} < (^[it^fow)! 2 ]) 3 < (l + C(8 l ,8 2 )- 2 k 2a ) 1 2yfN. (4.14) 

Using this in (|4.13l) we obtain that for an appropriate constant C\{8i,6 2 ), 

Eu N [D{{ yi ,...,y N })} < _L- + C 1 (0 lj 2 ) K " 



K + l iV "VJV 
1 

Choosing K = N 2 ( 1+a '> we obtain the Theorem. ■ 



Remark. The stochastic process studied in this section can be reformulated in terms of 
the iterates of a skew-product dynamical system, as defined in Cornfeld, Fomin and Sinai 
Chap. 10] and Petersen p] . Let E = {0, 1} N denote the set of all zero-one sequences 
s = (sq, si, S2, ■■■), with the product topology, which is a compact space with natural invariant 
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measure, and let S : £ — » S be the shift operator S(so, s\, S2, •••) = (si,S2>S3> •••)• The skew- 
product dynamical system T:SxT->SxT over the base £, with fibers T = R/Z, is defined 
by 

T(s,x) :=(5(s),x + /(a )(modl)), 

with /(0) = 0i, /(l) = 02) respectively. Here the initial condition is (s(°),xo), with £ £ 
being a random starting point. The invariant measure on £ x T is the product measure, using 
Lebesgue measure on T, and T is ergodic with respect to this measure if at least one of 9% and 
02 is irrational. The initial result of this section (Theorem 14. lj) shows weak convergence of 
almost all orbits to Lebesgue measure on T for the dynamical system. This result is true in 
great generality for ergodic skew products. However the detailed result on rate of convergence 
to Lebesgue measure (Theorem I4.2|) relies on specific properties of this dynamical system. 



5. Application to the 3x + 1 map 

We can describe the 3x+l iteration applied to an integer m in terms of the parity of its iterates. 
We set T(°)(m) = m and define the parity sequence {b k (m) : k > 0} with each b k (m) £ {0, 1} 
by 

b k {m) = T (fc) (m) (mod 2). (5.1) 
Proposition 5.1. (1) The k-th iterate T^ k \m) for k > 1 has the form 

ofe (m)+...+b fc _i(m) 

rW(m) = - k m + R k (m) (5.2) 

in which the remainder term 

^ 3 6 j'+i (m)+...+6fc-i(m) 
Rk(m) :=2^b j {m) ^— (5.3) 

3=0 

depends only on m (mod 2 k ). 

(2) Each b k {m) depends only on m (mod 2 k+1 ). For each vector (6q, b\, 6jv-i) £ 
{0, 1}^ i/iere is a unique residue class m (mod 2^) such that 

b k (m) = b k for < k < N - 1. (5.4) 

Proof. (1) This is easily proved by induction on k, see Lagarias (141 (2.6)]. 
(2) This is also proved by induction on k, see Lagarias [141 Theorem B]. ■ 



We define x k (m) = T^ k \m) and view 



g&o(m)+...+6 fc _i(m) 



x k {m) := ^ m (5.5) 

as an approximation to x k (m). Viewing the base B > 2 as fixed, we set y k (m) := \og B x k {m) 
and the main result will concern the discrepancy of most sets yNijn) := {yi(m), ...,yi^(m)}. 
We approximate the y k (m) by 

fc-l 

y k (m) := log B x k {m) = log B m + bj(m)\ log B 3 - klog B 2. (5.6) 

3=0 
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and we will study the sets 3^at("t.) := {yi(m), ...,yN(m)}) for variable m as realizations of a 
stochastic process of the kind treated in §4. 

The following lemma shows that the error of approximation of [Viv( m ) by 3V(w-) is expo- 
nentially small in N for most to. 

Lemma 5.1. Let the integer B > 2 be fixed. There exists an exceptional subset Eb(N) of 
integers 1 < to < 2 N such that 

\E B (N)\ < 2 1+ m> N , 
and such that if 1 < m < 2 N is not in Eb(N) then 

\y k {n) - y k (n)\ < 2 1 ~TUo N for 1 < k < N, (5.7) 

for every n = m (mod 2^). 

Proof. We will prove more, and show that the set Eb{N) may be taken to be the set of 
integers 1 < to < 2^ such that either to < 2100^, or 6q(to) + . . . + bjv-i(m) < |iV. Since all 
2^ possible choices for the parities bo(m), &jv-i(m) occur exactly once, we see that the 
number of to satisfying the second criterion above is < X^k-n (^0 — 2 H ^ N < 2wo N , where 

— ^ 99 

H(x) = — x log 2 x — (1 — x) log 2 (l — x) is the binary entropy function. Thus \Eb(N)\ < 2 1+ Too Ar , 
as desired. It remains now to show (|5.7[) holds for to ^ Eb(N). 

Suppose now that to ^ Eb(N) and that n = to (mod 2^). Proposition 15.11 gives that 
6fc_i(n) = bk-i(m) and R k (n) = R k {m) for 1 < k < N. Observe that 

Xk{n) _ 1+ R k{n) _ 1 gfc(m) < 1 -Rfc(m) _ x k (m) 
Xk(n) x k (n) x k (n) ~ x k (m) x k {m)' 

from which it follows that y k {n) — y k (n) < y k {m) — y k (m). Thus it suffices to verify ()5.7|) for 
n = to. 

From ()5.3|) we see that 

3 fe-i-i /3\* 



flfc(m) < 2 k-J -= V2 
i=o 

Applying this bound together with log(l + £) < £, we obtain that 

»(m) - fc(m) = log B (l + 5M) < < ^ I S '-W-)--^-.(-). (5. 

V x k {m) J logB x k ym) log i> to 

99 AT 

Since m ^ Eb{N) we have that to > 2wo , and in addition that 

fe-l fe-l JV-1 

fc - (to) = £(1 - 6,(to)) < £(1 - b iM) <N--N = -N. 

j=0 j=0 3=0 '* * 

Thus from (|5.8jl we deduce for to ^ E B (N) that 

I/feM - yjfc(m) < -1-2-^^3^ < 2 1 -TBo iV , 
logS 

3 98 

(since 3^ < 2Too) which proves the Lemma. ■ 
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We wish to bound the discrepancy of most sets 3^v( m )> viewed over a range 1 < m < X, 
with X > 2 N . We will study the translated sets 

y* N {m) := y N (m) - log B m, (5.9) 

so that the initial element yg(m) is zero. Since the discrepancy function is translation invariant 
we have that 

D(T N W) = D(y N {m)) (5.10) 
Note also that y%(rn) = y^(m + 2^) and so it will suffice to consider the range 1 < m < 2 N . 

Lemma 5.2. Let B > 2 and N > 1 be fixed. Then the ensemble {3^v( m ) • 1 — m — 

of 2 N sequences of length N is identical in distribution with the distribution ujn of the first 

N elements of the stochastic process V{6\, 62, yo = 0), with parameters 6\ = log B | and 62 = 

Proof. From the definitions we see easily that y%(m) = y^_ 1 (m) + 6\ if bk-i(m) = 1, and 
that yl(m) = y^_i(m) + 62 if 6fc_i(m) = 0. Proposition 15. 1( 2) shows that for 1 < m < 2 N all 
possible patterns (bo, b%, bjy-i) occur exactly once. This corresponds exactly to independent 
draws in the stochastic process T J (O 1 ,02,yo = 0); the 2^ possible sequences a; at of length N of 
T(6\, ^2)2/o = 0) have equal probabilities and match the sequences above. ■ 



Lemma 5.3. For each real B > 1 the pair {61,62) = (logs §> logs \) is two-dimensional 
Diophantine with exponent 7.616. 

Proof. We invoke a result of Rhin [22] (see inequality (8) there) obtained using Pade ap- 
proximation methods: There exists a positive constant C such that for integers uo, u\ and U2 
with max(|ui|, \v,2\) > 1 we have 



\uq + Ui log 2 + U2 log 3| > C( max(|wi|, \u2 



-7.616 



(5.11) 



Let k be a large positive integer and suppose that £\ is the nearest integer to k6\ and that 
—I2 is the nearest integer to k&2- Thus|fc#i— i\\ = \\kdi\\ and \k log B 2— £%\ = |fc#2+^2| = ||^#2||- 
Note that both £\ and I2 are positive and roughly of size k. On the one hand we see that 



log(3/2) h 



log 2 



« 2 fclog fl (3/2)-^A;log B 2 



k£ 2 log B 2 



< 



i2\\k6 1 \\ + h\\k6 2 \ 
k£ 2 log B 2 



On the other hand we see that by (|5.11() 

l 2 log 3- (h +£ 2 ) log 2 



log(3/2) £1 
log 2 £ 2 



dog 2 



> C 



(£1 + £2) 



-7.616 



dog 2 



Since l\ and £2 are roughly of size k, combining the above two statements immediately gives 
the Lemma. ■ 
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Proof of Theorem 12.11 We view the integer B > 2 and N > 1 as fixed. Consider the 
realizations a; at of the stochastic process V(9i,92,yo = 0) with 0\ = \og B § and 62 = log B ^. 
By Lemma 15.31 and Theorem 14 . 2 1 we obtain that (with a = 7.616) 

E CVN [D({y 1 ,...,y N })]<CN-^ <CN~rs, 

for an appropriate positive constant C. Using Markov's inequality that Prob[Y~ > a] < 
for a nonnegative random variable Y, we deduce that 



Prob 



D({ yi , . . . ,y n }) > 



< CN~m. 



Invoking Lemma 15.21 we conclude that the exceptional set of m with 1 < m < 2^ such that 
D{y^{m)) > N~36 has cardinality at most CN~3S2 N . By Lemma 15. II we know that for most 
1 < m < 2 N the sets 3^jv(^) and CPjv('n) are very close term by term, and by Proposition 13.31 
for such m the discrepancies D{y^{m)) and D{y^(m)) are very nearly equal. Thus we may 
deduce that the exceptional set of m with 1 < m < 2 N such that 



D(y N {m)) > iV-36 +2 2 



100 



has cardinality at most 

CN-^2 N + 2 1+ -m N . 

This easily gives the conclusion of the theorem for X = 2 N . 

It remains to treat the case X > 2^. Suppose £2 N < X < (£ + 1)2^, for some I > 1. 
Since the discrepancies D{y^{m)) are periodic (mod 2^) we see that the exceptional set of 
m < X with large discrepancy contains no more than i + 1 times the number of exceptional 
m <2 N . This completes the proof. ■ 
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